AITopics | self-supervised audiovisual matching

Collaborating Authors

self-supervised audiovisual matching

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Review for NeurIPS paper: Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Neural Information Processing SystemsJan-25-2025, 17:41:29 GMT

Additional Feedback: The paper presents a framework for localizing sounding objects in an audiovisual scene. Overall, I liked the paper. The proposed approach is neat and makes sense to the most extent. I have a few points of concern and I would like to see the author's responses on them. I would be happy to raise my overall score if the responses are satisfactory.

neurips paper, object localization, self-supervised audiovisual matching, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Neural Information Processing SystemsOct-10-2024, 12:56:58 GMT

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes.

object localization, self-supervised audiovisual matching

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback